Minas Gerais
Comparative Analysis of Deepfake Detection Models: New Approaches and Perspectives
The growing threat posed by deepfake videos, capable of manipulating realities and disseminating misinformation, drives the urgent need for effective detection methods. This work investigates and compares different approaches for identifying deepfakes, focusing on the GenConViT model and its performance relative to other architectures present in the DeepfakeBenchmark. To contextualize the research, the social and legal impacts of deepfakes are addressed, as well as the technical fundamentals of their creation and detection, including digital image processing, machine learning, and artificial neural networks, with emphasis on Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), and Transformers. The performance evaluation of the models was conducted using relevant metrics and new datasets established in the literature, such as WildDeep-fake and DeepSpeak, aiming to identify the most effective tools in the battle against misinformation and media manipulation. The obtained results indicated that GenConViT, after fine-tuning, exhibited superior performance in terms of accuracy (93.82%) and generalization capacity, surpassing other architectures in the DeepfakeBenchmark on the DeepSpeak dataset. This study contributes to the advancement of deepfake detection techniques, offering contributions to the development of more robust and effective solutions against the dissemination of false information.
Joint State-Parameter Observer-Based Robust Control of a UAV for Heavy Load Transportation
Rego, Brenner S., Cardoso, Daniel N., Terra, Marco. H., Raffo, Guilherme V.
Taking advantage of their versatility and autonomous operation, unmanned aerial vehicles (UAVs) can be used for aerial load transportation, with many applications such as vertical replenishment of seaborne vessels [11], deployment of supplies in search-and-rescue missions [1], package delivery, and landmine detection [2]. Aerial load transportation using UA Vs is a challenging task in terms of modeling and control. The load may be connected to the UAV either rigidly or by means of a rope, which changes its dynamics considerably. In addition, the load physical parameters are often unknown in practice, and their knowledge is usually necessary to effectively accomplish the task. A model-free control approach based on trajectory generation by reinforcement learning has been proposed in [7] for path tracking of the load using a quadrotor UAV (QUAV). This work was in part supported by the project INCT (National Institute of Science and Technology) for Cooperative Autonomous Systems Applied to Security and Environment under the grants CNPq 465755/2014-3 and F APESP 2014/50851-0, and by the Brazilian agencies CAPES under the grant numbers 88887.136349/2017-00
Extracting Deformation-Aware Local Features by Learning to Deform Renato Martins Universidade Federal de Minas Gerais Université Bourgogne Franche-Comté Felipe Cadar
Despite the advances in extracting local features achieved by handcrafted and learning-based descriptors, they are still limited by the lack of invariance to nonrigid transformations. In this paper, we present a new approach to compute features from still images that are robust to non-rigid deformations to circumvent the problem of matching deformable surfaces and objects. Our deformation-aware local descriptor, named DEAL, leverages a polar sampling and a spatial transformer warping to provide invariance to rotation, scale, and image deformations. We train the model architecture end-to-end by applying isometric non-rigid deformations to objects in a simulated environment as guidance to provide highly discriminative local features. The experiments show that our method outperforms state-of-the-art handcrafted, learning-based image, and RGB-D descriptors in different datasets with both real and realistic synthetic deformable objects in still images.
Fire and Smoke Datasets in 20 Years: An In-depth Review
Boroujeni, Sayed Pedram Haeri, Mehrabi, Niloufar, Afghah, Fatemeh, McGrath, Connor Peter, Bhatkar, Danish, Biradar, Mithilesh Anil, Razi, Abolfazl
Fire and smoke phenomena pose a significant threat to the natural environment, ecosystems, and global economy, as well as human lives and wildlife. In this particular circumstance, there is a demand for more sophisticated and advanced technologies to implement an effective strategy for early detection, real-time monitoring, and minimizing the overall impacts of fires on ecological balance and public safety. Recently, the rapid advancement of Artificial Intelligence (AI) and Computer Vision (CV) frameworks has substantially revolutionized the momentum for developing efficient fire management systems. However, these systems extensively rely on the availability of adequate and high-quality fire and smoke data to create proficient Machine Learning (ML) methods for various tasks, such as detection and monitoring. Although fire and smoke datasets play a critical role in training, evaluating, and testing advanced Deep Learning (DL) models, a comprehensive review of the existing datasets is still unexplored. For this purpose, we provide an in-depth review to systematically analyze and evaluate fire and smoke datasets collected over the past 20 years. We investigate the characteristics of each dataset, including type, size, format, collection methods, and geographical diversities. We also review and highlight the unique features of each dataset, such as imaging modalities (RGB, thermal, infrared) and their applicability for different fire management tasks (classification, segmentation, detection). Furthermore, we summarize the strengths and weaknesses of each dataset and discuss their potential for advancing research and technology in fire management. Ultimately, we conduct extensive experimental analyses across different datasets using several state-of-the-art algorithms, such as ResNet-50, DeepLab-V3, and YoloV8.
Evaluating the Effectiveness of LLMs in Fixing Maintainability Issues in Real-World Projects
Nunes, Henrique, Figueiredo, Eduardo, Rocha, Larissa, Nadi, Sarah, Ferreira, Fischer, Esteves, Geanderson
Large Language Models (LLMs) have gained attention for addressing coding problems, but their effectiveness in fixing code maintainability remains unclear. This study evaluates LLMs capability to resolve 127 maintainability issues from 10 GitHub repositories. We use zero-shot prompting for Copilot Chat and Llama 3.1, and few-shot prompting with Llama only. The LLM-generated solutions are assessed for compilation errors, test failures, and new maintainability problems. Llama with few-shot prompting successfully fixed 44.9% of the methods, while Copilot Chat and Llama zero-shot fixed 32.29% and 30%, respectively. However, most solutions introduced errors or new maintainability issues. We also conducted a human study with 45 participants to evaluate the readability of 51 LLM-generated solutions. The human study showed that 68.63% of participants observed improved readability. Overall, while LLMs show potential for fixing maintainability issues, their introduction of errors highlights their current limitations.
Online path planning for kinematic-constrained UAVs in a dynamic environment based on a Differential Evolution algorithm
Freitas, Elias J. R., Cohen, Miri Weiss, Guimarães, Frederico G., Pimenta, Luciano C. A.
In our recent work [5], we proposed a novel Differential The increasing use of fixed-wing Unmanned Aerial Vehicles Evolution-based path planner that handles kinematicconstrained (UAVs) is driven by several factors, such as longrange, UAVs. In this approach, we also show that high speeds, and superior payload capacity compared using the Non-Uniform Rational B-spline (NURBS) curve as to quadrotors. Combined with motion planning strategies, the path representation can provide a more flexible planner these advantages enable fixed-wing UAVs also to navigate than using the B-spline representation.
A study on the effects of mixed explicit and implicit communications in human-virtual-agent interactions
Campos, Ana Christina Almada, Adorno, Bruno Vilhena
Communication between humans and robots (or virtual agents) is essential for interaction and often inspired by human communication, which uses gestures, facial expressions, gaze direction, and other explicit and implicit means. This work presents an interaction experiment where humans and virtual agents interact through explicit (gestures, manual entries using mouse and keyboard, voice, sound, and information on screen) and implicit (gaze direction, location, facial expressions, and raise of eyebrows) communication to evaluate the effect of mixed explicit-implicit communication against purely explicit communication. Results obtained using Bayesian parameter estimation show that the number of errors and task execution time did not significantly change when mixed explicit and implicit communications were used, and neither the perceived efficiency of the interaction. In contrast, acceptance, sociability, and transparency of the virtual agent increased when using mixed communication modalities (88.3%, 92%, and 92.9% of the effect size posterior distribution of each variable, respectively, were above the upper limit of the region of practical equivalence). This suggests that task-related measures, such as time, number of errors, and perceived efficiency of the interaction, have not been influenced by the communication type in our particular experiment. However, the improvement of subjective measures related to the virtual agent, such as acceptance, sociability, and transparency, suggests that humans are more receptive to mixed explicit and implicit communications.
A Semi-Lagrangian Approach for Time and Energy Path Planning Optimization in Static Flow Fields
Campos, Víctor C. da S., Neto, Armando A., Macharet, Douglas G.
In this context, new challenges arise when robotic systems address not just a singular objective but multiple and often conflicting goals. These objectives can range from minimizing travel time and energy consumption simultaneously to optimizing factors like safety and resource allocation [2]. In single-objective approaches, the most commonly prioritized factors are typically the path's length [3, 4] and travel time [5, 6]. However, by incorporating other additional attributes, such as path safety/vulnerability and smoothness [7, 8], we can significantly improve both the quality and the applicability of results. Regarding the more general class of routing problems, where a sequence of visits is demanded, a multi-objective variant of the Orienteering Problem (OP) was proposed in [9], where the goal was to maximize the cumulative reward obtained while concurrently minimizing the exposure to sensors deployed in the environment. Furthermore, it is also imperative to acknowledge that, in numerous domains, environmental dynamics substantially influence the trajectories and behaviors of the vehicles. This is particularly evident in fields such as aerospace, where factors like air density, wind patterns, and gravitational forces intricately shape the aircraft flight paths [10].
Sabi\'a-2: A New Generation of Portuguese Large Language Models
Almeida, Thales Sales, Abonizio, Hugo, Nogueira, Rodrigo, Pires, Ramon
We introduce Sabi\'a-2, a family of large language models trained on Portuguese texts. The models are evaluated on a diverse range of exams, including entry-level tests for Brazilian universities, professional certification exams, and graduate-level exams for various disciplines such as accounting, economics, engineering, law and medicine. Our results reveal that our best model so far, Sabi\'a-2 Medium, matches or surpasses GPT-4's performance in 23 out of 64 exams and outperforms GPT-3.5 in 58 out of 64 exams. Notably, specialization has a significant impact on a model's performance without the need to increase its size, allowing us to offer Sabi\'a-2 Medium at a price per token that is 10 times cheaper than GPT-4. Finally, we identified that math and coding are key abilities that need improvement.
Leveraging Self-Supervised Learning for Scene Recognition in Child Sexual Abuse Imagery
Valois, Pedro H. V., Macedo, João, Ribeiro, Leo S. F., Santos, Jefersson A. dos, Avila, Sandra
Crime in the 21st century is split into a virtual and real world. However, the former has become a global menace to people's well-being and security in the latter. The challenges it presents must be faced with unified global cooperation, and we must rely more than ever on automated yet trustworthy tools to combat the ever-growing nature of online offenses. Over 10 million child sexual abuse reports are submitted to the US National Center for Missing & Exploited Children every year, and over 80% originated from online sources. Therefore, investigation centers and clearinghouses cannot manually process and correctly investigate all imagery. In light of that, reliable automated tools that can securely and efficiently deal with this data are paramount. In this sense, the scene recognition task looks for contextual cues in the environment, being able to group and classify child sexual abuse data without requiring to be trained on sensitive material. The scarcity and limitations of working with child sexual abuse images lead to self-supervised learning, a machine-learning methodology that leverages unlabeled data to produce powerful representations that can be more easily transferred to target tasks. This work shows that self-supervised deep learning models pre-trained on scene-centric data can reach 71.6% balanced accuracy on our indoor scene classification task and, on average, 2.2 percentage points better performance than a fully supervised version. We cooperate with Brazilian Federal Police experts to evaluate our indoor classification model on actual child abuse material. The results demonstrate a notable discrepancy between the features observed in widely used scene datasets and those depicted on sensitive materials.